- 
                Notifications
    You must be signed in to change notification settings 
- Fork 381
Expand FAILED_PRECONDITION error in NodeExpandVolume RPC call #431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Expand FAILED_PRECONDITION error in NodeExpandVolume RPC call #431
Conversation
| cc @bswartz | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
23d227b    to
    f72fbbe      
    Compare
  
    | The build failures on this branch are unrelated to this PR. Filed #434 for tracking. | 
898aec4    to
    bd02984      
    Compare
  
    | | Exceeds capabilities | 3 INVALID_ARGUMENT | Indicates that CO has specified capabilities not supported by the volume. | Caller MAY verify volume capabilities by calling ValidateVolumeCapabilities and retry with matching capabilities. | | ||
| | Volume does not exist | 5 NOT FOUND | Indicates that a volume corresponding to the specified volume_id does not exist. | Caller MUST verify that the volume_id is correct and that the volume is accessible and has not been deleted before retrying with exponential back off. | | ||
| | Volume in use | 9 FAILED_PRECONDITION | Indicates that the volume corresponding to the specified `volume_id` could not be expanded because it is node-published or node-staged and the underlying filesystem does not support expansion of published or staged volumes. | Caller MUST NOT retry. | | ||
| | Volume in use or readonly | 9 FAILED_PRECONDITION | Indicates that the volume corresponding to the specified `volume_id` could not be expanded because it is node-published or node-staged or readonly and the storage provider does not support expansion of published or staged or readonly volumes. | Caller MUST NOT retry. | | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| | Volume in use or readonly | 9 FAILED_PRECONDITION | Indicates that the volume corresponding to the specified `volume_id` could not be expanded because it is node-published or node-staged or readonly and the storage provider does not support expansion of published or staged or readonly volumes. | Caller MUST NOT retry. | | |
| | Volume in use or readonly | 9 FAILED_PRECONDITION | Indicates that the volume corresponding to the specified `volume_id` could not be expanded because it is node-published or node-staged or readonly and the storage provider does not support expansion of published or staged or readonly volumes. | Caller SHOULD ensure that volume is not node-published before retrying with exponential back off. | | 
|  | ||
| Otherwise `NodeExpandVolume` MUST be called after successful `NodePublishVolume`. | ||
|  | ||
| A plugin that has `STAGE_UNSTAGE_VOLUME` node capability and supports `NodeExpandVolume` ONLY after `NodeStageVolume` and before `NodePublishVolume` but not after `NodePublishVolume` may return `FAILED_PRECONDITION` error code if `NodeExpandVolume` is called after `NodePublishVolume` - CO MUST NOT retry if volume is node-published. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean to say  CO MUST NOT retry if volume is node-published? Because that means offline is not supported?
| A plugin that has `STAGE_UNSTAGE_VOLUME` node capability and supports `NodeExpandVolume` ONLY after `NodeStageVolume` and before `NodePublishVolume` but not after `NodePublishVolume` may return `FAILED_PRECONDITION` error code if `NodeExpandVolume` is called after `NodePublishVolume` - CO MUST NOT retry if volume is node-published. | |
| An SP that has `STAGE_UNSTAGE_VOLUME` node capability and supports `NodeExpandVolume` ONLY after `NodeStageVolume` and before `NodePublishVolume` but not after `NodePublishVolume` may return `FAILED_PRECONDITION` error code if `NodeExpandVolume` is called after `NodePublishVolume` and, in response, the CO SHOULD ensure that the volume is is node-staged but not node-published before retrying with exponential backoff. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did mean "CO MUST NOT retry if volume is node-published". The reason being - if a SP does not allow expansion after node-publish and volume already has been node-published, the only way NodeExpandVolume can succeed after retry is - if they NodeUnpublish the volume and the only way CO can retry expansion is on another node after node-stage but before node-publish. For CO it is already too late to retry expansion on this node because NodeUnpublish  is very likely to be a unsafe operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because that means offline is not supported?
I think it does not mean that - it means that CO must try expansion on another node after node-stage and before node-publish. It just means it can't happen on this node. Having said that - there is a slight problem with language:
If plugin has
STAGE_UNSTAGE_VOLUMEnode capability then:
NodeExpandVolumeMUST be called after successfulNodeStageVolume.
NodeExpandVolumeMAY be called before or afterNodePublishVolume.
I know Kubernetes always calls NodeExpandVolume after NodeStageVolume if NodeExpand is supported but in general a different CO may choose to interpret this differently and may skip calling NodeExpand between node-stage and node-publish and only call NodeExpand after node-publish in which case "offline" expansion may not work on that particular CO.   Even then I think recovery from volume in-use error on node shouldn't be "retry with exponential backoff"  because it is highly problematic to node-unpublish a volume or if volume is readonly on a node, make it RW.
I think we have two options:
- Make no spec change for this particular behaviour in NodeExpandVolume. The downside might be - another CO may interpret this spec differently and implement call toNodeExpandVolumedifferently and plugins might be dependent on k8s behaviour (although I don't know any plugins which want this behaviour).
- Make the language change around when NdoeExpandVolumecan be called. so something like - "it is desirable to call NodeExpandVolume before NodePublish to permit expansion of volumes which only permit expansion after node-stage and before node-publish and not after node-publish"
bd02984    to
    0d6a8ba      
    Compare
  
    
Expand the error from
NodeExpandVolumeto include readonly volume errors and add additional context about returningFAILED_PRECONDITIONerror ifNodeExpandVolumeis not supported for node-published volumes.